skip to main content


Search for: All records

Creators/Authors contains: "Beerel, Peter A."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. High-quality 3D image recognition is an important component of many vision and robotics systems. However, the accurate processing of these images requires the use of compute-expensive 3D Convolutional Neural Networks (CNNs). To address this challenge, we propose the use of Spiking Neural Networks (SNNs) that are generated from iso-architecture CNNs and trained with quantization-aware gradient descent to optimize their weights, membrane leak, and firing thresholds. During both training and inference, the analog pixel values of a 3D image are directly applied to the input layer of the SNN without the need to convert to a spike-train. This significantly reduces the training and inference latency and results in high degree of activation sparsity, which yields significant improvements in computational efficiency. However, this introduces energy-hungry digital multiplications in the first layer of our models, which we propose to mitigate using a processing-in-memory (PIM) architecture. To evaluate our proposal, we propose a 3D and a 3D/2D hybrid SNN-compatible convolutional architecture and choose hyperspectral imaging (HSI) as an application for 3D image recognition. We achieve overall test accuracy of 98.68, 99.50, and 97.95% with 5 time steps (inference latency) and 6-bit weight quantization on the Indian Pines, Pavia University, and Salinas Scene datasets, respectively. In particular, our models implemented using standard digital hardware achieved accuracies similar to state-of-the-art (SOTA) with ~560.6× and ~44.8× less average energy than an iso-architecture full-precision and 6-bit quantized CNN, respectively. Adopting the PIM architecture in the first layer, further improves the average energy, delay, and energy-delay-product (EDP) by 30, 7, and 38%, respectively. 
    more » « less
  2. Single flux quantum (SFQ) logic is a promising technology to replace complementary metal-oxide-semiconductor logic for future exa-scale supercomputing but requires the development of reliable EDA tools that are tailored to the unique characteristics of SFQ circuits, including the need for active splitters to support fanout and clocked logic gates. This article is the first work to present a physical design methodology for inserting hold buffers in SFQ circuits. Our approach is variation-aware, uses common path pessimism removal and incremental placement to minimize the overhead of timing fixes, and can trade off layout area and timing yield. Compared to a previously proposed approach using fixed hold time margins, Monte Carlo simulations show that, averaging across 10 ISCAS’85 benchmark circuits, our proposed method can reduce the number of inserted hold buffers by 8.4% with a 6.2% improvement in timing yield and by 21.9% with a 1.7% improvement in timing yield. 
    more » « less
  3. null (Ed.)
  4. null (Ed.)
  5. null (Ed.)
    This paper presents a dynamic network rewiring (DNR) method to generate pruned deep neural network (DNN) models that are robust against adversarial attacks yet maintain high accuracy on clean im- ages. In particular, the disclosed DNR method is based on a unified constrained optimization formulation using a hybrid loss function that merges ultra-high model compression with robust adversar- ial training. This training strategy dynamically adjusts inter-layer connectivity based on per-layer normalized momentum computed from the hybrid loss function. In contrast to existing robust pruning frameworks that require multiple training iterations, the proposed learning strategy achieves an overall target pruning ratio with only a single training iteration and can be tuned to support both irregu- lar and structured channel pruning. To evaluate the merits of DNR, experiments were performed with two widely accepted models, namely VGG16 and ResNet-18, on CIFAR-10, CIFAR-100 as well as with VGG16 on Tiny-ImageNet. Compared to the baseline un- compressed models, DNR provides over 20× compression on all the datasets with no significant drop in either clean or adversarial classification accuracy. Moreover, our experiments show that DNR consistently finds compressed models with better clean and adver- sarial image classification performance than what is achievable through state-of-the-art alternatives. Our models and test codes are available at https://github.com/ksouvik52/DNR_ASP_DAC2021. 
    more » « less
  6. null (Ed.)
    Artificial neural networks (NNs) in deep learning systems are critical drivers of emerging technologies such as computer vision, text classification, and natural language processing. Fundamental to their success is the development of accurate and efficient NN models. In this article, we report our work on Deep-n-Cheap—an open-source automated machine learning (AutoML) search framework for deep learning models. The search includes both architecture and training hyperparameters and supports convolutional neural networks and multi-layer perceptrons, applicable to multiple domains. Our framework is targeted for deployment on both benchmark and custom datasets, and as a result, offers a greater degree of search space customizability as compared to a more limited search over only pre-existing models from literature. We also introduce the technique of ‘search transfer’, which demonstrates the generalization capabilities of the models found by our framework to multiple datasets. Deep-n-Cheap includes a user-customizable complexity penalty which trades off performance with training time or number of parameters. Specifically, our framework can find models with performance comparable to state-of-the- art while taking 1–2 orders of magnitude less time to train than models from other AutoML and model search frameworks. Additionally, we investigate and develop insight into the search process that should aid future development of deep learning models. 
    more » « less
  7. The high computational complexity associated with training deep neural networks limits online and real-time training on edge devices. This paper proposed an end-to-end training and inference scheme that eliminates multiplications by approximate operations in the log-domain which has the potential to significantly reduce implementation complexity. We implement the entire training procedure in the log-domain, with fixed-point data representations. This training procedure is inspired by hardware-friendly approximations of log-domain addition which are based on look-up tables and bit-shifts. We show that our 16-bit log-based training can achieve classification accuracy within approximately 1% of the equivalent floating-point baselines for a number of commonly used datasets. 
    more » « less
  8. The high demand for computational and storage resources severely impedes the deployment of deep convolutional neural networks (CNNs) in limited resource devices. Recent CNN architectures have proposed reduced complexity versions (e.g,. SuffleNet and MobileNet) but at the cost of modest decreases in accuracy. This paper proposes pSConv, a pre-defined sparse 2D kernel based convolution, which promises significant improvements in the trade-off between complexity and accuracy for both CNN training and inference. To explore the potential of this approach, we have experimented with two widely accepted datasets, CIFAR-10 and Tiny ImageNet, in sparse variants of both the ResNet18 and VGG16 architectures. Our approach shows a parameter count reduction of up to 4.24× with modest degradation in classification accuracy relative to that of standard CNNs. Our approach outperforms a popular variant of ShuffleNet using a variant of ResNet18 with pSConv having 3 × 3 kernels with only four of nine elements not fixed at zero. In particular, the parameter count is reduced by 1.7× for CIFAR-10 and 2.29× for Tiny ImageNet with an increased accuracy of ~ 4%. 
    more » « less
  9. Artificial Neural Networks (ANNs) play a key role in many machine learning (ML) applications but poses arduous challenges in terms of storage and computation of network parameters. Memristive crossbar arrays (MCAs) are capable of both computation and storage, making them promising for in-memory computing enabled neural network accelerators. At the same time, the presence of a significant amount of zero weights in ANNs has motivated research in a variety of parameter reduction techniques. However, for crossbar based architectures, the study of efficient methods to take advantage of network sparsity is still in the early stage. This paper presents CSrram, an efficient ex-situ training framework for hybrid CMOS-memristive neuromorphic circuits. CSrram includes a pre-defined block diagonal clustered (BDC) sparsity algorithm to significantly reduce area and power consumption. The proposed framework is verified on a wide range of datasets including MNIST handwritten recognition, fashion MNIST, breast cancer prediction (BCW), IRIS, and mobile health monitoring. Compared to state of the art fully connected memristive neuromorphic circuits, our CSrram with only 25% density of weights in the first junction, provides a power and area efficiency of 1.5x and 2.6x (averaged over five datasets), respectively, without any significant test accuracy loss. 
    more » « less